In this test run, we will perform portfolio analysis, visualization, and optimization. A portfolio is a collection of financial assets such as stocks, bonds, cash, and real estate. The purpose of a portfolio is to diversify the investments and manage risks. Portfolio optimization is the process of selecting the optimal portfolio out of a set of available portfolios to maximize returns and reduce risks.
Key Definitions:
- Open - The opening price at which a security first trades when the stock market opens.
- High - The highest price at which the security is traded during a regular trading session.
- Low - The lowest price at which the security is traded during a regular trading session.
- Close - The closing or the last price at which the security trades during a regular trading session.
- Volume - No. of shares traded between the market open and close.
- Adjusted Close - adjusted stock close price after considering stock splits and dividends.
# IMPORT LIBRARIES AND DATASETS
# Import key librares and modules
import pandas as pd
import numpy as np
# Import datetime module that comes pre-installed in Python
# datetime offers classes that work with date & time information
import datetime as dt
# Use Pandas to read stock data
stock_amzn = pd.read_csv('AMZN01.csv')
stock_amzn.head(241)
| Date | Open | High | Low | Close | Adj Close | Volume | |
|---|---|---|---|---|---|---|---|
| 0 | 2004-08-01 | 1.922500 | 2.041500 | 1.742500 | 1.907000 | 1.907000 | 3600448000 |
| 1 | 2004-09-01 | 1.911500 | 2.214000 | 1.868000 | 2.043000 | 2.043000 | 3523538000 |
| 2 | 2004-10-01 | 2.064500 | 2.135000 | 1.650000 | 1.706500 | 1.706500 | 4171368000 |
| 3 | 2004-11-01 | 1.711000 | 2.075000 | 1.707500 | 1.984000 | 1.984000 | 3369498000 |
| 4 | 2004-12-01 | 2.000000 | 2.284000 | 1.899500 | 2.214500 | 2.214500 | 3520296000 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 236 | 2024-04-01 | 180.789993 | 189.770004 | 166.320007 | 175.000000 | 175.000000 | 917021100 |
| 237 | 2024-05-01 | 181.639999 | 191.699997 | 173.869995 | 176.440002 | 176.440002 | 892301700 |
| 238 | 2024-06-01 | 177.699997 | 199.839996 | 175.919998 | 193.250000 | 193.250000 | 813276000 |
| 239 | 2024-07-01 | 193.490005 | 201.199997 | 192.820007 | 195.050003 | 195.050003 | 303090400 |
| 240 | 2024-07-11 | 200.089996 | 200.269897 | 192.860001 | 195.050003 | 195.050003 | 43773883 |
241 rows × 7 columns
# Count the number of missing values in "stock_amzn" Pandas DataFrame
stock_amzn.isnull().sum()
Date 0 Open 0 High 0 Low 0 Close 0 Adj Close 0 Volume 0 dtype: int64
# Obtain information about the Pandas DataFrame such as data types, memory utilization..etc
stock_amzn.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 241 entries, 0 to 240 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Date 241 non-null object 1 Open 241 non-null float64 2 High 241 non-null float64 3 Low 241 non-null float64 4 Close 241 non-null float64 5 Adj Close 241 non-null float64 6 Volume 241 non-null int64 dtypes: float64(5), int64(1), object(1) memory usage: 13.3+ KB
Calculating asset returns is a fundamental step in portfolio optimization. Asset returns can be calculated in several ways depending on the type of return. Simple returns (also known as arithmetic returns) are calculated as the percentage change in the price of an asset over a given period. The formula for simple returns is: Rt = (Pt - Pt-1) / Pt-1
where: Rt = Return at time t. Pt = Price of the security at time t. Pt-1 = Price of the security at time t-1.
# Calculate the percentage monthly return
stock_amzn['Monthly Return'] = stock_amzn['Adj Close'].pct_change(1) * 100
stock_amzn
| Date | Open | High | Low | Close | Adj Close | Volume | Monthly Return | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2004-08-01 | 1.922500 | 2.041500 | 1.742500 | 1.907000 | 1.907000 | 3600448000 | NaN |
| 1 | 2004-09-01 | 1.911500 | 2.214000 | 1.868000 | 2.043000 | 2.043000 | 3523538000 | 7.131620 |
| 2 | 2004-10-01 | 2.064500 | 2.135000 | 1.650000 | 1.706500 | 1.706500 | 4171368000 | -16.470876 |
| 3 | 2004-11-01 | 1.711000 | 2.075000 | 1.707500 | 1.984000 | 1.984000 | 3369498000 | 16.261354 |
| 4 | 2004-12-01 | 2.000000 | 2.284000 | 1.899500 | 2.214500 | 2.214500 | 3520296000 | 11.617944 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 236 | 2024-04-01 | 180.789993 | 189.770004 | 166.320007 | 175.000000 | 175.000000 | 917021100 | -2.982595 |
| 237 | 2024-05-01 | 181.639999 | 191.699997 | 173.869995 | 176.440002 | 176.440002 | 892301700 | 0.822858 |
| 238 | 2024-06-01 | 177.699997 | 199.839996 | 175.919998 | 193.250000 | 193.250000 | 813276000 | 9.527317 |
| 239 | 2024-07-01 | 193.490005 | 201.199997 | 192.820007 | 195.050003 | 195.050003 | 303090400 | 0.931438 |
| 240 | 2024-07-11 | 200.089996 | 200.269897 | 192.860001 | 195.050003 | 195.050003 | 43773883 | 0.000000 |
241 rows × 8 columns
# We have noticed "NaN" in the first row. Let's replace the first row with zeros instead of NaN
stock_amzn.replace({'Daily Return': 0}, inplace = True)
stock_amzn
| Date | Open | High | Low | Close | Adj Close | Volume | Monthly Return | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2004-08-01 | 1.922500 | 2.041500 | 1.742500 | 1.907000 | 1.907000 | 3600448000 | 0.000000 |
| 1 | 2004-09-01 | 1.911500 | 2.214000 | 1.868000 | 2.043000 | 2.043000 | 3523538000 | 7.131620 |
| 2 | 2004-10-01 | 2.064500 | 2.135000 | 1.650000 | 1.706500 | 1.706500 | 4171368000 | -16.470876 |
| 3 | 2004-11-01 | 1.711000 | 2.075000 | 1.707500 | 1.984000 | 1.984000 | 3369498000 | 16.261354 |
| 4 | 2004-12-01 | 2.000000 | 2.284000 | 1.899500 | 2.214500 | 2.214500 | 3520296000 | 11.617944 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 236 | 2024-04-01 | 180.789993 | 189.770004 | 166.320007 | 175.000000 | 175.000000 | 917021100 | -2.982595 |
| 237 | 2024-05-01 | 181.639999 | 191.699997 | 173.869995 | 176.440002 | 176.440002 | 892301700 | 0.822858 |
| 238 | 2024-06-01 | 177.699997 | 199.839996 | 175.919998 | 193.250000 | 193.250000 | 813276000 | 9.527317 |
| 239 | 2024-07-01 | 193.490005 | 201.199997 | 192.820007 | 195.050003 | 195.050003 | 303090400 | 0.931438 |
| 240 | 2024-07-11 | 200.089996 | 200.269897 | 192.860001 | 195.050003 | 195.050003 | 43773883 | 0.000000 |
241 rows × 8 columns
# Use the describe() method to obtain a statistical summary about the data
# Over the specified time period, the average adjusted close price for Amazon stock was $51.159
# The maximum adjusted close price was $195.05
# The maximum volume of shares traded on one day were 32,76,182,000
stock_amzn.describe().round(2)
| Open | High | Low | Close | Adj Close | Volume | Monthly Return | |
|---|---|---|---|---|---|---|---|
| count | 241.00 | 241.00 | 241.00 | 241.00 | 241.00 | 2.410000e+02 | 241.00 |
| mean | 50.52 | 54.24 | 47.31 | 51.16 | 51.16 | 2.248021e+09 | 2.46 |
| std | 57.87 | 61.83 | 54.30 | 58.22 | 58.22 | 1.150596e+09 | 10.38 |
| min | 1.33 | 1.55 | 1.29 | 1.34 | 1.34 | 4.377388e+07 | -30.48 |
| 25% | 4.62 | 4.87 | 4.16 | 4.66 | 4.66 | 1.432892e+09 | -4.30 |
| 50% | 17.00 | 18.29 | 15.88 | 17.73 | 17.73 | 1.888910e+09 | 2.30 |
| 75% | 90.22 | 97.82 | 85.87 | 92.39 | 92.39 | 2.843062e+09 | 8.23 |
| max | 200.09 | 201.20 | 192.86 | 195.05 | 195.05 | 6.925740e+09 | 54.13 |
As we have considered a single stock (Amazon), let's perform some data visualization on the same.
# Matplotlib is a comprehensive data visualization library in Python
# Seaborn is a visualization library that sits on top of matplotlib and offers enhanced features
# plotly.express module contains functions that can create interactive figures using a few lines of code
import matplotlib.pyplot as plt
!pip install seaborn
import seaborn as sns
import plotly.express as px
Requirement already satisfied: seaborn in c:\users\ok\documents\python_working_files\lib\site-packages (0.13.2) Requirement already satisfied: numpy!=1.24.0,>=1.20 in c:\users\ok\documents\python_working_files\lib\site-packages (from seaborn) (1.26.4) Requirement already satisfied: pandas>=1.2 in c:\users\ok\documents\python_working_files\lib\site-packages (from seaborn) (2.2.2) Requirement already satisfied: matplotlib!=3.6.1,>=3.4 in c:\users\ok\documents\python_working_files\lib\site-packages (from seaborn) (3.8.4) Requirement already satisfied: contourpy>=1.0.1 in c:\users\ok\documents\python_working_files\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.2.0) Requirement already satisfied: cycler>=0.10 in c:\users\ok\documents\python_working_files\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (0.11.0) Requirement already satisfied: fonttools>=4.22.0 in c:\users\ok\documents\python_working_files\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (4.51.0) Requirement already satisfied: kiwisolver>=1.3.1 in c:\users\ok\documents\python_working_files\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.4.4) Requirement already satisfied: packaging>=20.0 in c:\users\ok\documents\python_working_files\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (23.2) Requirement already satisfied: pillow>=8 in c:\users\ok\documents\python_working_files\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (10.3.0) Requirement already satisfied: pyparsing>=2.3.1 in c:\users\ok\documents\python_working_files\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (3.0.9) Requirement already satisfied: python-dateutil>=2.7 in c:\users\ok\documents\python_working_files\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (2.9.0.post0) Requirement already satisfied: pytz>=2020.1 in c:\users\ok\documents\python_working_files\lib\site-packages (from pandas>=1.2->seaborn) (2024.1) Requirement already satisfied: tzdata>=2022.7 in c:\users\ok\documents\python_working_files\lib\site-packages (from pandas>=1.2->seaborn) (2023.3) Requirement already satisfied: six>=1.5 in c:\users\ok\documents\python_working_files\lib\site-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.4->seaborn) (1.16.0)
# Let's plot a Line Plot Using Plotly Express
fig = px.line(title = 'Amazon.com, Inc. (AMZN) Adjusted Closing Price [$]')
fig.add_scatter(x = stock_amzn['Date'], y = stock_amzn['Adj Close'], name = 'Adj Close')
stock_amzn
| Date | Open | High | Low | Close | Adj Close | Volume | Monthly Return | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2004-08-01 | 1.922500 | 2.041500 | 1.742500 | 1.907000 | 1.907000 | 3600448000 | 0.000000 |
| 1 | 2004-09-01 | 1.911500 | 2.214000 | 1.868000 | 2.043000 | 2.043000 | 3523538000 | 7.131620 |
| 2 | 2004-10-01 | 2.064500 | 2.135000 | 1.650000 | 1.706500 | 1.706500 | 4171368000 | -16.470876 |
| 3 | 2004-11-01 | 1.711000 | 2.075000 | 1.707500 | 1.984000 | 1.984000 | 3369498000 | 16.261354 |
| 4 | 2004-12-01 | 2.000000 | 2.284000 | 1.899500 | 2.214500 | 2.214500 | 3520296000 | 11.617944 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 236 | 2024-04-01 | 180.789993 | 189.770004 | 166.320007 | 175.000000 | 175.000000 | 917021100 | -2.982595 |
| 237 | 2024-05-01 | 181.639999 | 191.699997 | 173.869995 | 176.440002 | 176.440002 | 892301700 | 0.822858 |
| 238 | 2024-06-01 | 177.699997 | 199.839996 | 175.919998 | 193.250000 | 193.250000 | 813276000 | 9.527317 |
| 239 | 2024-07-01 | 193.490005 | 201.199997 | 192.820007 | 195.050003 | 195.050003 | 303090400 | 0.931438 |
| 240 | 2024-07-11 | 200.089996 | 200.269897 | 192.860001 | 195.050003 | 195.050003 | 43773883 | 0.000000 |
241 rows × 8 columns
# This function coded below performs interactive data visualization using Plotly Express
def plot_financial_data(amzn, title):
fig = px.line(title = title)
# For loop that plots all stock prices in the pandas dataframe pd
# Note that index starts with 1 because we want to skip the date column
for i in amzn.columns[1:]:
fig.add_scatter(x = amzn['Date'], y = amzn[i], name = i)
fig.update_traces(line_width = 5)
fig.update_layout({'plot_bgcolor': "white"})
fig.show()
# Plot High, Low, Open, Close and Adj Close
plot_financial_data(stock_amzn.drop(['Volume', 'Monthly Return'], axis = 1), 'Amazon.com, Inc. (AMZN) Stock Price [$]')
# Plot trading volume
plot_financial_data(stock_amzn.iloc[:,[0,5]], 'Amazon.com, Inc. (AMZN) Trading Volume')
# Plot % Monthly Returns
plot_financial_data(stock_amzn.iloc[:,[0,7]], 'Amazon.com, Inc. (AMZN) Percentage Monthly Return [%]')
# Plot the stocks monthly returns
plot_financial_data(stock_data, 'Percentage Monthly Returns [%]')
import pandas as pd
import yfinance as yf # Import yfinance library for stock data
import numpy as np
# Define a list of stock tickers we are interested in.
tickers = ['ADBE', 'AMZN', 'CAT', 'DE', 'EXC', 'JNJ', 'JPM', 'MSFT', 'PFE', 'PG']
# Create an empty DataFrame to store adjusted close prices
adj_close_data = pd.DataFrame()
# Download the adjusted close prices for each stock
for ticker in tickers:
# Fetch the data from Yahoo Finance
data = yf.download(ticker, start='2004-01-01', end='2024-01-01', progress=False)
# Select only the Adjusted Close prices and rename the column to the ticker symbol
adj_close_data[ticker] = data['Adj Close']
print("Adjusted Close Prices DataFrame:")
print(adj_close_data.head())
# Calculate the percentage monthly returns
monthly_returns_df = adj_close_data.resample('M').ffill().pct_change() * 100 # Resample to monthly frequency, forward-fill missing data, calculate percentage change
# Replace NaN values with 0 (if any, but there shouldn't be any after pct_change)
monthly_returns_df.replace(np.nan, 0, inplace=True)
print("\nMonthly Returns DataFrame:")
print(monthly_returns_df.head())
Adjusted Close Prices DataFrame:
ADBE AMZN CAT DE EXC JNJ \
Date
2004-01-02 19.518396 2.5950 24.635588 21.555693 10.938865 29.117960
2004-01-05 19.867832 2.6635 25.306253 21.585871 10.988595 29.287048
2004-01-06 19.887802 2.6515 24.859148 20.985796 10.909038 29.157417
2004-01-07 19.694548 2.5950 24.587902 21.522177 10.975337 29.168690
2004-01-08 18.975481 2.5120 24.397137 21.367962 10.976991 29.303968
JPM MSFT PFE PG
Date
2004-01-02 20.959133 17.105961 15.121106 28.238073
2004-01-05 20.919071 17.535954 15.525189 28.377855
2004-01-06 21.445629 17.598263 15.516676 28.235224
2004-01-07 21.760416 17.579573 15.648537 27.984182
2004-01-08 22.132431 17.548412 15.503916 28.278000
Monthly Returns DataFrame:
ADBE AMZN CAT DE EXC JNJ \
Date
2004-01-31 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2004-02-29 -2.741515 -14.662695 -3.046225 2.603812 1.072320 1.362429
2004-03-31 5.537608 0.627760 4.382843 8.353766 2.576653 -5.917244
2004-04-30 5.597969 0.739373 -1.244059 -1.832327 -2.802329 6.526017
2004-05-31 7.542179 11.238527 -3.061856 -3.439193 0.365505 3.647175
JPM MSFT PFE PG
Date
2004-01-31 0.000000 0.000000 0.000000 0.000000
2004-02-29 5.476989 -4.050699 0.503663 1.414769
2004-03-31 2.267181 -6.030919 -4.365555 2.311918
2004-04-30 -9.642520 4.813523 2.025696 1.352106
2004-05-31 -2.021284 0.382667 -0.704461 1.918878
C:\Users\ok\AppData\Local\Temp\ipykernel_1956\1710146094.py:23: FutureWarning: 'M' is deprecated and will be removed in a future version, please use 'ME' instead.
monthly_returns_df = adj_close_data.resample('M').ffill().pct_change() * 100 # Resample to monthly frequency, forward-fill missing data, calculate percentage change
# Basic Analysis
# Calculate and print the average monthly return for each stock
average_monthly_returns = monthly_returns_df.mean()
print("\nAverage Monthly Returns:")
print(average_monthly_returns)
# Calculate and print the standard deviation of the monthly returns for each stock
std_deviation_monthly_returns = monthly_returns_df.std()
print("\nStandard Deviation of Monthly Returns:")
print(std_deviation_monthly_returns)
Average Monthly Returns: ADBE 1.832078 AMZN 2.271867 CAT 1.476629 DE 1.569282 EXC 0.631864 JNJ 0.771122 JPM 1.148342 MSFT 1.498255 PFE 0.424252 PG 0.771014 dtype: float64 Standard Deviation of Monthly Returns: ADBE 8.742673 AMZN 10.639783 CAT 9.142596 DE 8.218994 EXC 5.450420 JNJ 4.204065 JPM 7.789727 MSFT 6.490790 PFE 6.052442 PG 4.457197 dtype: float64
import matplotlib.pyplot as plt
import seaborn as sns
# Plot the percentage monthly returns
plt.figure(figsize=(12, 8))
monthly_returns_df.plot(title='Percentage Monthly Returns for Stocks', xlabel='Date', ylabel='Monthly Return (%)', marker='o')
plt.legend(tickers, loc='best')
plt.show()
# Plot the heatmap of the correlation matrix
plt.figure(figsize=(10, 8))
sns.heatmap(monthly_returns_df.corr(), annot=True, cmap='coolwarm', vmin=-1, vmax=1, center=0)
plt.title('Correlation Heatmap of Monthly Returns')
plt.show()
<Figure size 1200x800 with 0 Axes>
adj_close_data.reset_index(inplace=True)
monthly_returns_df = adj_close_data.set_index('Date').resample('M').ffill().pct_change() * 100 # Resample to monthly frequency, forward-fill missing data, calculate percentage change
monthly_returns_df.replace(np.nan, 0, inplace=True) # Replace NaN values with 0
C:\Users\ok\AppData\Local\Temp\ipykernel_1956\2594310399.py:1: FutureWarning: 'M' is deprecated and will be removed in a future version, please use 'ME' instead.
monthly_returns_df = adj_close_data.set_index('Date').resample('M').ffill().pct_change() * 100 # Resample to monthly frequency, forward-fill missing data, calculate percentage change
monthly_returns_df()
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[9], line 1 ----> 1 monthly_returns_df() TypeError: 'DataFrame' object is not callable
monthly_returns_df
| ADBE | AMZN | CAT | DE | EXC | JNJ | JPM | MSFT | PFE | PG | |
|---|---|---|---|---|---|---|---|---|---|---|
| Date | ||||||||||
| 2004-01-31 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 2004-02-29 | -2.741515 | -14.662695 | -3.046225 | 2.603812 | 1.072320 | 1.362429 | 5.476989 | -4.050699 | 0.503663 | 1.414769 |
| 2004-03-31 | 5.537608 | 0.627760 | 4.382843 | 8.353766 | 2.576653 | -5.917244 | 2.267181 | -6.030919 | -4.365555 | 2.311918 |
| 2004-04-30 | 5.597969 | 0.739373 | -1.244059 | -1.832327 | -2.802329 | 6.526017 | -9.642520 | 4.813523 | 2.025696 | 1.352106 |
| 2004-05-31 | 7.542179 | 11.238527 | -3.061856 | -3.439193 | 0.365505 | 3.647175 | -2.021284 | 0.382667 | -0.704461 | 1.918878 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2023-08-31 | 2.411345 | 3.239080 | 6.018774 | -4.343588 | -3.297341 | -2.791216 | -7.362619 | -2.222562 | -1.885761 | -1.254008 |
| 2023-09-30 | -8.838994 | -7.890727 | -2.891901 | -7.842950 | -5.807569 | -3.667743 | -0.895250 | -3.664287 | -6.246478 | -5.494355 |
| 2023-10-31 | 4.345951 | 4.696345 | -16.769947 | -3.185119 | 3.043130 | -4.757630 | -3.402213 | 7.081556 | -7.868551 | 3.508122 |
| 2023-11-30 | 14.838554 | 9.767827 | 10.913513 | -0.260012 | -0.178091 | 5.095414 | 12.239306 | 12.294541 | 1.049772 | 2.326201 |
| 2023-12-31 | -2.358396 | 4.004385 | 17.928364 | 10.133882 | -6.777457 | 1.344897 | 8.982584 | -0.757439 | -5.513621 | -4.546648 |
240 rows × 10 columns
# Insert the Date column at the start of the monthly_returns_df DataFrame
monthly_returns_df.insert(0, 'Date', monthly_returns_df.index) # Use the index as the Date
# Define a function that performs interactive data visualization using Plotly Express
import plotly.express as px
def plot_financial_data(df, title):
fig = px.line(title = title)
# For loop that plots all stock prices in the pandas dataframe df
# Note that index starts with 1 because we want to skip the date column
for i in df.columns[1:]:
fig.add_scatter(x = df['Date'], y = df[i], name = i)
fig.update_traces(line_width = 5)
fig.update_layout({'plot_bgcolor': "white"})
fig.show()
# Plot closing prices using plotly Express. Note that we used the same pre-defined function "plot_financial_data"
plot_financial_data(monthly_returns_df, 'Adjusted Closing Prices [$]')
# Plot the stocks daily returns
plot_financial_data(monthly_returns_df, 'Percentage Monthy Returns [%]')
# Plot histograms for stocks monthly returns using plotly express
fig = px.histogram(monthly_returns_df.drop(columns = ['Date']))
fig.update_layout({'plot_bgcolor': "white"})
# Plot a heatmap showing the correlations between monthly returns
# Strong positive correlations between Catterpillar and John Deere - both into heavy equipment and machinery
# MSFT and Google - both into Tech and Cloud Computing
plt.figure(figsize = (10, 8))
sns.heatmap(monthly_returns_df.drop(columns = ['Date']).corr(), annot = True);
# Plot the Pairplot between stocks monthly returns
sns.pairplot(monthly_returns_df);
# Function to scale stock prices based on their initial starting price
# The objective of this function is to set all prices to start at a value of 1
def price_scaling(raw_prices_df):
scaled_prices_df = raw_prices_df.copy()
for i in raw_prices_df.columns[1:]:
scaled_prices_df[i] = raw_prices_df[i]/raw_prices_df[i][0]
return scaled_prices_df
price_scaling(adj_close_data)
| Date | ADBE | AMZN | CAT | DE | EXC | JNJ | JPM | MSFT | PFE | PG | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2004-01-02 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
| 1 | 2004-01-05 | 1.017903 | 1.026397 | 1.027223 | 1.001400 | 1.004546 | 1.005807 | 0.998089 | 1.025137 | 1.026723 | 1.004950 |
| 2 | 2004-01-06 | 1.018926 | 1.021773 | 1.009075 | 0.973562 | 0.997273 | 1.001355 | 1.023212 | 1.028780 | 1.026160 | 0.999899 |
| 3 | 2004-01-07 | 1.009025 | 1.000000 | 0.998064 | 0.998445 | 1.003334 | 1.001742 | 1.038231 | 1.027687 | 1.034880 | 0.991009 |
| 4 | 2004-01-08 | 0.972184 | 0.968015 | 0.990321 | 0.991291 | 1.003485 | 1.006388 | 1.055980 | 1.025865 | 1.025316 | 1.001414 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 5028 | 2023-12-22 | 30.676188 | 59.121386 | 11.689238 | 18.202732 | 3.157575 | 5.256253 | 7.848499 | 21.817856 | 1.822835 | 5.079738 |
| 5029 | 2023-12-26 | 30.651084 | 59.117534 | 11.901397 | 18.406860 | 3.181754 | 5.279244 | 7.894915 | 21.822518 | 1.823477 | 5.102816 |
| 5030 | 2023-12-27 | 30.539395 | 59.090557 | 12.001639 | 18.423833 | 3.171008 | 5.286345 | 7.942268 | 21.788152 | 1.836314 | 5.107011 |
| 5031 | 2023-12-28 | 30.510704 | 59.105974 | 11.951719 | 18.381937 | 3.210411 | 5.294121 | 7.984465 | 21.858628 | 1.847867 | 5.095472 |
| 5032 | 2023-12-29 | 30.566034 | 58.551060 | 11.903007 | 18.410020 | 3.214888 | 5.299531 | 7.975088 | 21.902896 | 1.847867 | 5.123794 |
5033 rows × 11 columns
# Let's create an array that holds random portfolio weights
# Note that portfolio weights must add up to 1
import random
def generate_portfolio_weights(n):
weights = []
for i in range(n):
weights.append(random.random())
# let's make the sum of all weights add up to 1
weights = weights/np.sum(weights)
return weights
# Call the function (Run this cell multiple times to generate different outputs)
weights = generate_portfolio_weights(10)
print(weights)
[0.07813861 0.1442829 0.04616586 0.14572471 0.00682053 0.1134458 0.13514778 0.1435415 0.08587131 0.10086099]
# Let's define the "weights" list similar to the slides
weights = [0.07813861, 0.1442829, 0.04616586, 0.14572471, 0.00682053, 0.1134458, 0.13514778, 0.1435415, 0.08587131, 0.10086099]
weights
[0.07813861, 0.1442829, 0.04616586, 0.14572471, 0.00682053, 0.1134458, 0.13514778, 0.1435415, 0.08587131, 0.10086099]
# Let's display "adj_close_data" Pandas DataFrame
adj_close_data
| Date | ADBE | AMZN | CAT | DE | EXC | JNJ | JPM | MSFT | PFE | PG | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2004-01-02 | 19.518396 | 2.595000 | 24.635588 | 21.555693 | 10.938865 | 29.117960 | 20.959133 | 17.105961 | 15.121106 | 28.238073 |
| 1 | 2004-01-05 | 19.867832 | 2.663500 | 25.306253 | 21.585871 | 10.988595 | 29.287048 | 20.919071 | 17.535954 | 15.525189 | 28.377855 |
| 2 | 2004-01-06 | 19.887802 | 2.651500 | 24.859148 | 20.985796 | 10.909038 | 29.157417 | 21.445629 | 17.598263 | 15.516676 | 28.235224 |
| 3 | 2004-01-07 | 19.694548 | 2.595000 | 24.587902 | 21.522177 | 10.975337 | 29.168690 | 21.760416 | 17.579573 | 15.648537 | 27.984182 |
| 4 | 2004-01-08 | 18.975481 | 2.512000 | 24.397137 | 21.367962 | 10.976991 | 29.303968 | 22.132431 | 17.548412 | 15.503916 | 28.278000 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 5028 | 2023-12-22 | 598.750000 | 153.419998 | 287.971252 | 392.372498 | 34.540291 | 153.051361 | 164.497726 | 373.215393 | 27.563284 | 143.442017 |
| 5029 | 2023-12-26 | 598.260010 | 153.410004 | 293.197906 | 396.772614 | 34.804779 | 153.720825 | 165.470566 | 373.295135 | 27.572990 | 144.093689 |
| 5030 | 2023-12-27 | 596.080017 | 153.339996 | 295.667419 | 397.138489 | 34.687229 | 153.927582 | 166.463058 | 372.707275 | 27.767097 | 144.212158 |
| 5031 | 2023-12-28 | 595.520020 | 153.380005 | 294.437622 | 396.235382 | 35.118248 | 154.154007 | 167.347473 | 373.912842 | 27.941795 | 143.886322 |
| 5032 | 2023-12-29 | 596.599976 | 151.940002 | 293.237579 | 396.840729 | 35.167229 | 154.311539 | 167.150925 | 374.670074 | 27.941795 | 144.686081 |
5033 rows × 11 columns
# Scale stock prices using the "price_scaling" function that we defined earlier (make all stock values start at 1)
portfolio_df = adj_close_data.copy()
scaled_df = price_scaling(portfolio_df)
scaled_df
| Date | ADBE | AMZN | CAT | DE | EXC | JNJ | JPM | MSFT | PFE | PG | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2004-01-02 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
| 1 | 2004-01-05 | 1.017903 | 1.026397 | 1.027223 | 1.001400 | 1.004546 | 1.005807 | 0.998089 | 1.025137 | 1.026723 | 1.004950 |
| 2 | 2004-01-06 | 1.018926 | 1.021773 | 1.009075 | 0.973562 | 0.997273 | 1.001355 | 1.023212 | 1.028780 | 1.026160 | 0.999899 |
| 3 | 2004-01-07 | 1.009025 | 1.000000 | 0.998064 | 0.998445 | 1.003334 | 1.001742 | 1.038231 | 1.027687 | 1.034880 | 0.991009 |
| 4 | 2004-01-08 | 0.972184 | 0.968015 | 0.990321 | 0.991291 | 1.003485 | 1.006388 | 1.055980 | 1.025865 | 1.025316 | 1.001414 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 5028 | 2023-12-22 | 30.676188 | 59.121386 | 11.689238 | 18.202732 | 3.157575 | 5.256253 | 7.848499 | 21.817856 | 1.822835 | 5.079738 |
| 5029 | 2023-12-26 | 30.651084 | 59.117534 | 11.901397 | 18.406860 | 3.181754 | 5.279244 | 7.894915 | 21.822518 | 1.823477 | 5.102816 |
| 5030 | 2023-12-27 | 30.539395 | 59.090557 | 12.001639 | 18.423833 | 3.171008 | 5.286345 | 7.942268 | 21.788152 | 1.836314 | 5.107011 |
| 5031 | 2023-12-28 | 30.510704 | 59.105974 | 11.951719 | 18.381937 | 3.210411 | 5.294121 | 7.984465 | 21.858628 | 1.847867 | 5.095472 |
| 5032 | 2023-12-29 | 30.566034 | 58.551060 | 11.903007 | 18.410020 | 3.214888 | 5.299531 | 7.975088 | 21.902896 | 1.847867 | 5.123794 |
5033 rows × 11 columns
# Use enumerate() method to obtain the stock names along with a counter "i" (0, 1, 2, 3,..etc.)
# This counter "i" will be used as an index to access elements in the "weights" list
initial_investment = 10000000
for i, stock in enumerate(scaled_df.columns[1:]):
portfolio_df[stock] = weights[i] * scaled_df[stock] * initial_investment
portfolio_df.round(2)
| Date | ADBE | AMZN | CAT | DE | EXC | JNJ | JPM | MSFT | PFE | PG | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2004-01-02 | 781386.10 | 1442829.00 | 461658.60 | 1457247.10 | 68205.30 | 1134458.00 | 1351477.80 | 1435415.00 | 858713.10 | 1008609.90 |
| 1 | 2004-01-05 | 795375.17 | 1480915.26 | 474226.54 | 1459287.25 | 68515.38 | 1141045.81 | 1348894.54 | 1471497.04 | 881660.60 | 1013602.64 |
| 2 | 2004-01-06 | 796174.64 | 1474243.17 | 465848.01 | 1418719.91 | 68019.32 | 1135995.29 | 1382847.82 | 1476725.60 | 881177.13 | 1008508.12 |
| 3 | 2004-01-07 | 788438.02 | 1442829.00 | 460765.00 | 1454981.30 | 68432.71 | 1136434.47 | 1403145.78 | 1475157.26 | 888665.37 | 999541.40 |
| 4 | 2004-01-08 | 759651.40 | 1396680.74 | 457190.15 | 1444555.78 | 68443.02 | 1141705.03 | 1427133.89 | 1472542.50 | 880452.49 | 1010036.00 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 5028 | 2023-12-22 | 23969947.03 | 85302050.13 | 5396437.34 | 26525878.47 | 215363.38 | 5962998.13 | 10607071.57 | 31317677.99 | 1565292.43 | 5123474.12 |
| 5029 | 2023-12-26 | 23950331.10 | 85296493.15 | 5494382.22 | 26823343.10 | 217012.50 | 5989080.97 | 10669801.78 | 31324369.43 | 1565843.65 | 5146750.61 |
| 5030 | 2023-12-27 | 23863058.77 | 85257568.84 | 5540659.66 | 26848077.67 | 216279.56 | 5997136.36 | 10733799.27 | 31275040.23 | 1576866.81 | 5150982.10 |
| 5031 | 2023-12-28 | 23840640.21 | 85279813.73 | 5517613.87 | 26787024.21 | 218967.02 | 6005958.06 | 10790827.71 | 31376203.10 | 1586787.73 | 5139343.86 |
| 5032 | 2023-12-29 | 23883874.43 | 84479167.39 | 5495125.67 | 26827947.95 | 219272.42 | 6012095.62 | 10778153.96 | 31439744.88 | 1586787.73 | 5167909.72 |
5033 rows × 11 columns
# Assume that we have $10,000,000 that we would like to invest in one or more of the selected stocks
# Let's create a function that receives the following arguments:
# (1) Stocks closing prices
# (2) Random weights
# (3) Initial investment amount
# The function will return a DataFrame that contains the following:
# (1) Daily value (position) of each individual stock over the specified time period
# (2) Total daily value of the portfolio
# (3) Percentage daily return
def asset_allocation(df, weights, initial_investment):
portfolio_df = df.copy()
# Scale stock prices using the "price_scaling" function that we defined earlier (Make them all start at 1)
scaled_df = price_scaling(df)
for i, stock in enumerate(scaled_df.columns[1:]):
portfolio_df[stock] = scaled_df[stock] * weights[i] * initial_investment
# Sum up all values and place the result in a new column titled "portfolio value [$]"
# Note that we excluded the date column from this calculation
portfolio_df['Portfolio Value [$]'] = portfolio_df[portfolio_df != 'Date'].sum(axis = 1, numeric_only = True)
# Calculate the portfolio percentage monthly return and replace NaNs with zeros
portfolio_df['Portfolio monthly Return [%]'] = portfolio_df['Portfolio Value [$]'].pct_change(1) * 100
portfolio_df.replace(np.nan, 0, inplace = True)
return portfolio_df
portfolio_df
| Date | ADBE | AMZN | CAT | DE | EXC | JNJ | JPM | MSFT | PFE | PG | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2004-01-02 | 7.813861e+05 | 1.442829e+06 | 4.616586e+05 | 1.457247e+06 | 68205.300000 | 1.134458e+06 | 1.351478e+06 | 1.435415e+06 | 8.587131e+05 | 1.008610e+06 |
| 1 | 2004-01-05 | 7.953752e+05 | 1.480915e+06 | 4.742265e+05 | 1.459287e+06 | 68515.375148 | 1.141046e+06 | 1.348895e+06 | 1.471497e+06 | 8.816606e+05 | 1.013603e+06 |
| 2 | 2004-01-06 | 7.961746e+05 | 1.474243e+06 | 4.658480e+05 | 1.418720e+06 | 68019.323888 | 1.135995e+06 | 1.382848e+06 | 1.476726e+06 | 8.811771e+05 | 1.008508e+06 |
| 3 | 2004-01-07 | 7.884380e+05 | 1.442829e+06 | 4.607650e+05 | 1.454981e+06 | 68432.709848 | 1.136434e+06 | 1.403146e+06 | 1.475157e+06 | 8.886654e+05 | 9.995414e+05 |
| 4 | 2004-01-08 | 7.596514e+05 | 1.396681e+06 | 4.571901e+05 | 1.444556e+06 | 68443.020712 | 1.141705e+06 | 1.427134e+06 | 1.472542e+06 | 8.804525e+05 | 1.010036e+06 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 5028 | 2023-12-22 | 2.396995e+07 | 8.530205e+07 | 5.396437e+06 | 2.652588e+07 | 215363.381961 | 5.962998e+06 | 1.060707e+07 | 3.131768e+07 | 1.565292e+06 | 5.123474e+06 |
| 5029 | 2023-12-26 | 2.395033e+07 | 8.529649e+07 | 5.494382e+06 | 2.682334e+07 | 217012.501764 | 5.989081e+06 | 1.066980e+07 | 3.132437e+07 | 1.565844e+06 | 5.146751e+06 |
| 5030 | 2023-12-27 | 2.386306e+07 | 8.525757e+07 | 5.540660e+06 | 2.684808e+07 | 216279.562272 | 5.997136e+06 | 1.073380e+07 | 3.127504e+07 | 1.576867e+06 | 5.150982e+06 |
| 5031 | 2023-12-28 | 2.384064e+07 | 8.527981e+07 | 5.517614e+06 | 2.678702e+07 | 218967.022931 | 6.005958e+06 | 1.079083e+07 | 3.137620e+07 | 1.586788e+06 | 5.139344e+06 |
| 5032 | 2023-12-29 | 2.388387e+07 | 8.447917e+07 | 5.495126e+06 | 2.682795e+07 | 219272.424297 | 6.012096e+06 | 1.077815e+07 | 3.143974e+07 | 1.586788e+06 | 5.167910e+06 |
5033 rows × 11 columns
# Now let's put this code in a function and generate random weights
# Let's obtain the number of stocks under consideration (note that we ignored the "Date" column)
n = len(adj_close_data.columns)-1
# Let's generate random weights
print('Number of stocks under consideration = {}'.format(n))
weights = generate_portfolio_weights(n).round(6)
print('Portfolio weights = {}'.format(weights))
# Let's test out the "asset_allocation" function
portfolio_df = asset_allocation(adj_close_data, weights, 10000000)
portfolio_df.round(2)
Number of stocks under consideration = 10 Portfolio weights = [0.130612 0.140131 0.069716 0.137146 0.105677 0.035927 0.144462 0.084744 0.068839 0.082745]
| Date | ADBE | AMZN | CAT | DE | EXC | JNJ | JPM | MSFT | PFE | PG | Portfolio Value [$] | Portfolio Daily Return [%] | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2004-01-02 | 1306120.00 | 1401310.00 | 697160.00 | 1371460.00 | 1056770.00 | 359270.00 | 1444620.00 | 847440.00 | 688390.00 | 827450.00 | 9.999990e+06 | 0.00 |
| 1 | 2004-01-05 | 1329503.33 | 1438300.29 | 716139.10 | 1373380.05 | 1061574.29 | 361356.29 | 1441858.71 | 868742.11 | 706785.93 | 831545.98 | 1.012919e+07 | 1.29 |
| 2 | 2004-01-06 | 1330839.67 | 1431820.19 | 703486.51 | 1335200.88 | 1053888.49 | 359756.84 | 1478152.01 | 871828.95 | 706398.36 | 827366.50 | 1.009874e+07 | -0.30 |
| 3 | 2004-01-07 | 1317907.58 | 1401310.00 | 695810.55 | 1369327.58 | 1060293.48 | 359895.93 | 1499848.87 | 870903.03 | 712401.33 | 820010.32 | 1.010771e+07 | 0.09 |
| 4 | 2004-01-08 | 1269789.53 | 1356489.71 | 690412.10 | 1359515.81 | 1060453.23 | 361565.05 | 1525490.21 | 869359.32 | 705817.45 | 828619.95 | 1.002751e+07 | -0.79 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 5028 | 2023-12-22 | 40066782.89 | 82847389.31 | 8149269.30 | 24964318.88 | 3336831.03 | 1888413.97 | 11338098.00 | 18489324.02 | 1254821.49 | 4203229.28 | 1.965385e+08 | 0.01 |
| 5029 | 2023-12-26 | 40033994.02 | 82841992.24 | 8297177.85 | 25244271.98 | 3362382.42 | 1896674.11 | 11405151.50 | 18493274.51 | 1255263.38 | 4222325.00 | 1.970525e+08 | 0.26 |
| 5030 | 2023-12-27 | 39888114.62 | 82804188.02 | 8367062.35 | 25267550.44 | 3351026.28 | 1899225.16 | 11473559.61 | 18464151.55 | 1264100.13 | 4225796.46 | 1.970048e+08 | -0.02 |
| 5031 | 2023-12-28 | 39850641.05 | 82825792.78 | 8332260.43 | 25210091.15 | 3392665.68 | 1902018.90 | 11534518.39 | 18523876.06 | 1272053.27 | 4216248.60 | 1.970602e+08 | 0.03 |
| 5032 | 2023-12-29 | 39922908.88 | 82048185.92 | 8298300.55 | 25248605.75 | 3397397.56 | 1903962.59 | 11520971.17 | 18561389.84 | 1272053.27 | 4239683.64 | 1.964135e+08 | -0.33 |
5033 rows × 13 columns
# Plot the portfolio percentage monthly return
plot_financial_data(portfolio_df[['Date', 'Portfolio Daily Return [%]']], 'Portfolio Percentage monthly Return [%]')
# Plot each stock position in our portfolio over time
# This graph shows how our initial investment in each individual stock grows over time
plot_financial_data(portfolio_df.drop(['Portfolio Value [$]', 'Portfolio Daily Return [%]'], axis = 1), 'Portfolio positions [$]')
# Plot the total daily value of the portfolio (sum of all positions)
plot_financial_data(portfolio_df[['Date', 'Portfolio Value [$]']], 'Total Portfolio Value [$]')